Recognizing, Naming and Exploring Structure in RDF Data

نویسنده

  • Linnea Passing
چکیده

The Resource Description Framework (RDF) is the de facto standard for representing semantic data, employed e.g., in the Semantic Web or in data-intense domains such as the Life Sciences. Data in the RDF format can be handled efficiently using relational database systems (RDBMSs), because decades of research in RDBMSs led to mature techniques for storing and querying data. Previous work merely focused on the performance gain achieved by leveraging RDBMS techniques, but did not take other advantages, such as providing a SQL-based interface to the dataset and exposing relationships, into account. In contrast, our approach is the first to strive for a complete transformation of RDF data into the relational data model. For that purpose, inherently unstructured RDF data is structured by means of semantic information, and relationships between these structures are extracted. Moreover, names for structures, their attributes, and relationships are automatically generated. Subsequently, using the relational schema thus created, RDF data is physically stored in efficient data structures. Afterwards, it can be queried with high performance and in addition – because of the generated names – be presented to users. Our experiments show that structures exist even within Web-crawled RDF data which is considered dirty. Using our algorithms, we can represent 79% of the DBpedia dataset (machine readable part of Wikipedia) by using only 140 tables. Furthermore, our survey shows that the generated table names get an average score of 4.6 on a 5-point Likert scale (1 = bad, 5 = excellent). Our approach therefore enables users to gain a fast and simple overview over large amounts of seemingly unstructured RDF data by viewing the extracted relational model. Zusammenfassung Das Resource Description Framework (RDF) ist der de-facto-Standard zur Repräsentation von semantischen Daten, wie sie zum Beispiel im Semantic Web oder in datenintensiven Forschungsbereichen wie den Life Sciences verwendet werden. Daten im RDF-Format lassen sich effizient in relationalen Datenbanksystemen verarbeiten, weil diese seit Jahrzehnten entwickelten Systeme über ausgereifte Techniken zur Datenspeicherung und -abfrage verfügen. Bisherige Arbeiten verwenden relationale Datenbanksysteme lediglich zur Steigerung der Performanz von Abfragen über RDF-Daten. Weitere Vorteile dieser Systeme, etwa das Herausstellen von Beziehungen und das Anbieten einer SQL-Schnittstelle zu den Daten, wurden bislang nicht beachtet. Unser Ansatz strebt erstmals eine vollständige Transformation der RDF-Daten in das relationale Datenmodell an. Dazu werden die inhärent unstrukturierten RDF-Daten mit Hilfe semantischer Informationen strukturiert und Beziehungen zwischen den Strukturen extrahiert. Sowohl für Strukturen als auch für ihre Attribute und Beziehungen werden unter Zuhilfenahme semantischer Informationen Namen erzeugt. Mittels des so generierten relationalen Schemas werden RDF-Daten in effizienten Datenstrukturen gespeichert, können performant abgefragt werden und zusätzlich, aufgrund der vergebenen Namen, auch Nutzern präsentiert werden. Unsere Experimente zeigen, dass selbst per Webcrawler gesammelte „dreckige“ Daten, Strukturen enthalten. Mit unseren Algorithmen können 79% der DBpedia-Daten (DBpedia enthält den maschinenlesbaren Teil der Wikipedia) auf nur 140 Relationen abgebildet werden. Die automatisch generierten Tabellenamen wurden im Durchschnitt mit 4,6 auf einer 5-Punkt-Likert-Scala, bei der 1 die schechteste und 5 die beste Bewertung darstellt, bewertet. Somit ermöglicht unser Ansatz einen einfachen Überblick über große Mengen eigentlich unstrukturierter RDF-Daten.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structure of Wavelet Covariance Matrices and Bayesian Wavelet Estimation of Autoregressive Moving Average Model with Long Memory Parameter’s

In the process of exploring and recognizing of statistical communities, the analysis of data obtained from these communities is considered essential. One of appropriate methods for data analysis is the structural study of the function fitting by these data. Wavelet transformation is one of the most powerful tool in analysis of these functions and structure of wavelet coefficients are very impor...

متن کامل

I4-D14 Exploring and Taming Existence in Rule-based RDFQueries

RDF is an emerging knowledge representation formalism proposed by the W3C. A central feature of RDF are blank nodes, which allow to assert the existence of an entity without naming for it. Despite the importance of blank nodes for RDF, many existing RDF query language have only insu cient support for blank nodes. We propose a query language for RDF, called RDFLog, with extensive blank node supp...

متن کامل

Information Modelling using RDF Information Modelling using RDF Constructs for Modular Description of Complex Systems

This paper describes some experimental work for modelling complex systems with RDF. Basic RDF represents information at a very fine level of granularity. The thrust of this work is to build higher-level constructs in RDF that allow complex systems to be modelled incrementally, without necessarily having full knowledge of the detailed ontological structure of the complete system description. The...

متن کامل

Explorator: A tool for exploring RDF data through direct manipulation

In this paper we introduce Explorator, a tool for exploring the Semantic Web data by direct manipulation. Explorator implements a model of operations that is supported by a visual interface that enables the user, with minimal knowledge of RDF model, to explore an RDF database without a-priori knowledge of data domain. Consequently, it is well suited for tasks that involve information search, ex...

متن کامل

Materialized View-Based Processing of RDF Queries

The increasing interest in the RDF data model has turned the efficient processing of queries over RDF datasets to a challenging and crucial task. Indeed, the triple format of the RDF data model, along with the lack of structure that characterizes it, raise new challenges in data management both in terms of performance and scalability. In this paper, we consider improving the performance of RDF ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014